Keyword Extraction using Clustering and Semantic Analysis
نویسندگان
چکیده
Keywords are list of significant words or terms that best present the document context in brief and relate to the textual context. Extraction models are categorized into either statistical, linguistic, machine learning or a combination of these approaches. This paper introduces a model for extracting keywords by making words pairs and clustering these pairs based on the Semantic similarity that will be provided by using lesk algorithm and (WordNet), a lexical database for the English language. The model also used a statistical method to ensure clusters cohesion and provide more reliable result, because the final keywords will be selected from these clusters. This paper also show three other basic approaches to extract keywords, these approaches will be used to measure the efficient of the main approach. The proposed model showed enhanced over the three other approaches in both precision and recall.
منابع مشابه
Keyword Extraction for Webpage Clusters
The volume of unstructured information presented on the Internet is constantly increasing, together with the total amount of websites and their contents. To process this vast amount of information it is important to distinguish different clusters of related webpages. Such clusters are used, for example, for template induction, keyword extraction, and recommendation algorithms. A variety of appl...
متن کاملExperiments in Clustering Documents for Automatic Acquisition of Lexical Semantic Networks for Polish
The aim of this work is to explore document clustering techniques for the needs of semi–automatic construction of a lexical semantic network for Polish. Although the majority of research in this area is based on measures of distributional similarity calculated from co-occurrences of words in large collections of documents, we wanted to approach a difficult problem of meaning ambiguity resolutio...
متن کاملKeyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks
Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering. Graph-based approaches to keyword and keyphrase extraction avoid the problem of acquiring a large in-domain training corpus by applying variants of PageRank algorithm on a network of words. Although graph-based approache...
متن کاملSemantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map
This paper provides a framework for semantic correspondence of heterogeneous databases using selforganizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to...
متن کاملAnalysis of Statistical Keyword Extraction Methods for Incremental Clustering
Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Statistical keyword extraction methods from single documents are useful in this scenario. However, different statistical methods have different assumptio...
متن کامل